ZOMA Biography

My name is Wendyam Rodrigue ZOMA, I was born on February 24, 1996 and I am originally from the Central West region, particularly in Koudougou. I am currently in a year of license in an institute of technology, I like my studies, it is quite diversified, dynamic, clearly I like what I do, yet there is a field that I like very much which networking!

Since I was little, I have a taste for travelling! For a long time, I spent my vacations in Bobo Dioulasso and Banfora. Banfora was a region that I fell in love with thanks to its beauty, its nature and the serenity that it brings me. Then, I discovered the Ivory Coast thanks to a “free” religious trip! It is a country that fascinates me with many facets, an important cultural activity and very nice people! I love the city of Koudougou, it is my city, my favorite, it is a city where I feel good, where there is brotherhood!

My first visit to Leo’s clinic, www.cliniquesedogo.com

Leo’s

Task 2: gapminder country comparison

You have seen the gapminder dataset that has data on life expectancy, population, and GDP per capita for 142 countries from 1952 to 2007. To get a glipmse of the dataframe, namely to see the variable names, variable types, etc., we use the glimpse function. We also want to have a look at the first 20 rows of data.

glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", ~
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, ~
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, ~
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8~
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12~
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, ~
head(gapminder, 20) # look at the first 20 rows of the dataframe
## # A tibble: 20 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## 11 Afghanistan Asia       2002    42.1 25268405      727.
## 12 Afghanistan Asia       2007    43.8 31889923      975.
## 13 Albania     Europe     1952    55.2  1282697     1601.
## 14 Albania     Europe     1957    59.3  1476505     1942.
## 15 Albania     Europe     1962    64.8  1728137     2313.
## 16 Albania     Europe     1967    66.2  1984060     2760.
## 17 Albania     Europe     1972    67.7  2263554     3313.
## 18 Albania     Europe     1977    68.9  2509048     3533.
## 19 Albania     Europe     1982    70.4  2780097     3631.
## 20 Albania     Europe     1987    72    3075321     3739.

Your task is to produce two graphs of how life expectancy has changed over the years for the country and the continent you come from.

I have created the country_data and continent_data with the code below.

country_data <- gapminder %>% 
            filter(country == "Burkina Faso") # just choosing Greece, as this is where I come from

continent_data <- gapminder %>% 
            filter(continent == "Africa")

First, create a plot of life expectancy over time for the single country you chose. You should use geom_point() to see the actual data points and geom_smooth(se = FALSE) to plot the underlying trendlines. You need to remove the comments # from the lines below for your code to run.

 plot1 <- ggplot(data = country_data, mapping = aes(x = year, y = lifeExp))+
   geom_point() +
   geom_smooth(se = FALSE)+
   NULL 

 plot1
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Next we need to add a title. Create a new plot, or extend plot1, using the labs() function to add an informative title to the plot.

 plot1 <- ggplot(data = country_data, mapping = aes(x = year, y = lifeExp))+
   geom_point() +
   geom_smooth(se = FALSE) +
   labs(title = "life Expectancy in BF ",
       x = "year ",
       y = " life Expectancy") +
   NULL


 print(plot1)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Secondly, produce a plot for all countries in the continent you come from. (Hint: map the country variable to the colour aesthetic).

   ggplot(data = continent_data , mapping = aes(x =year  , y =lifeExp  , colour= country ))+
   geom_point() + 
   geom_smooth(se = FALSE) +
   NULL
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Finally, using the original gapminder data, produce a life expectancy over time graph, grouped (or faceted) by continent. We will remove all legends, adding the theme(legend.position="none") in the end of our ggplot.

   ggplot(data = gapminder , mapping = aes(x =year  , y =lifeExp  , colour= country ))+
   geom_point() + 
   geom_smooth(se = FALSE) +
   facet_wrap(~continent) +
   theme(legend.position="none") + #remove all legends
   NULL
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Given these trends, what can you say about life expectancy since 1952? Again, don’t just say what’s happening in the graph. Tell some sort of story and speculate about the differences in the patterns.

Type your answer after this blockquote.

Task 3: Brexit voting

We will have a quick look at the results of the 2016 Brexit vote in the UK. First we read the data using read_csv() and have a quick glimpse at the data

brexit_results <- read_csv(here::here("Data","brexit_results.csv"))


glimpse(brexit_results)
## Rows: 632
## Columns: 11
## $ Seat        <chr> "Aldershot", "Aldridge-Brownhills", "Altrincham and Sale W~
## $ con_2015    <dbl> 50.592, 52.050, 52.994, 43.979, 60.788, 22.418, 52.454, 22~
## $ lab_2015    <dbl> 18.333, 22.369, 26.686, 34.781, 11.197, 41.022, 18.441, 49~
## $ ld_2015     <dbl> 8.824, 3.367, 8.383, 2.975, 7.192, 14.828, 5.984, 2.423, 1~
## $ ukip_2015   <dbl> 17.867, 19.624, 8.011, 15.887, 14.438, 21.409, 18.821, 21.~
## $ leave_share <dbl> 57.89777, 67.79635, 38.58780, 65.29912, 49.70111, 70.47289~
## $ born_in_uk  <dbl> 83.10464, 96.12207, 90.48566, 97.30437, 93.33793, 96.96214~
## $ male        <dbl> 49.89896, 48.92951, 48.90621, 49.21657, 48.00189, 49.17185~
## $ unemployed  <dbl> 3.637000, 4.553607, 3.039963, 4.261173, 2.468100, 4.742731~
## $ degree      <dbl> 13.870661, 9.974114, 28.600135, 9.336294, 18.775591, 6.085~
## $ age_18to24  <dbl> 9.406093, 7.325850, 6.437453, 7.747801, 5.734730, 8.209863~

The data comes from Elliott Morris, who cleaned it and made it available through his DataCamp class on analysing election and polling data in R.

Our main outcome variable (or y) is leave_share, which is the percent of votes cast in favour of Brexit, or leaving the EU. Each row is a UK parliament constituency.

To get a sense of the spread of the data, plot a histogram and a density plot of the leave share in all constituencies.

ggplot(brexit_results, aes(x = leave_share)) +
   
    labs(title = "Histogramm of brexit_results ",
       x = "leave_share ",
       y = " count") +
  geom_histogram(binwidth = 2.5)

ggplot(brexit_results, aes(x = leave_share)) +
   
      labs(title = "Courb of brexit_results ",
       x = "leave_share ",
       y = " born_in_uk") +

  geom_density()

One common explanation for the Brexit outcome was fear of immigration and opposition to the EU’s more open border policy. We can check the relationship (or correlation) between the proportion of native born residents (born_in_uk) in a constituency and its leave_share. To do this, let us get the correlation between the two variables

brexit_results %>% 
  select(leave_share, born_in_uk) %>% 
  cor()
##             leave_share born_in_uk
## leave_share   1.0000000  0.4934295
## born_in_uk    0.4934295  1.0000000

The correlation is almost 0.5, which shows that the two variables are positively correlated.

We can also create a scatterplot between these two variables using geom_point. We also add the best fit line, using geom_smooth(method = "lm").

ggplot(brexit_results, aes(x = born_in_uk, y = leave_share)) +
  geom_point(alpha=0.3) +
   
    labs(title = "Point cloud of brexit_results ",
       x = "leave_share ",
       y = " born_in_uk") +
  geom_smooth(method = "lm") +
  theme_bw() +
  NULL
## `geom_smooth()` using formula 'y ~ x'

You have the code for the plots, I would like you to revisit all of them and use the labs() function to add an informative title, subtitle, and axes titles to all plots.

What can you say about the relationship shown above? Again, don’t just say what’s happening in the graph. Tell some sort of story and speculate about the differences in the patterns.

Type your answer after, and outside, this blockquote.

Submit the assignment

Knit the completed R Markdown file as ah HTML or Word document (use the “Knit” button at the top of the script editor window) and upload it to Canvas.

Details

If you want to, please answer the following

  • Who did you collaborate with: TYPE NAMES HERE
  • Approximately how much time did you spend on this problem set: ANSWER HERE
  • What, if anything, gave you the most trouble: ANSWER HERE

Prati